37 research outputs found
Blind men and elephants: What do citation summaries tell us about a research article?
The old Asian legend about the blind men and the elephant comes to mind when looking at how different authors of scientific papers describe a piece of related prior work. It turns out that different citations to the same paper often focus on different aspects of that paper and that neither provides a full description of its full set of contributions. In this article, we will describe our investigation of this phenomenon. We studied citation summaries in the context of research papers in the biomedical domain. A citation summary is the set of citing sentences for a given article and can be used as a surrogate for the actual article in a variety of scenarios. It contains information that was deemed by peers to be important. Our study shows that citation summaries overlap to some extent with the abstracts of the papers and that they also differ from them in that they focus on different aspects of these papers than do the abstracts. In addition to this, co-cited articles (which are pairs of articles cited by another article) tend to be similar. We show results based on a lexical similarity metric called cohesion to justify our claims.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57540/1/20707_ftp.pd
Continuous low-dose antibiotic prophylaxis for adults with repeated urinary tract infections (AnTIC): a randomised, open-label trial
Funder: UK National Institute for Health Research. Open Access funded by Department of Health UK Acknowledgments We thank all the participants for their commitment to the study, Sheila Wallace for updating the systematic review, members of the Trial Steering Committee and members of the Data Monitoring Committee for their valuable guidance. We thank the National Health Service organisations, principal investigators and local research staff who hosted and ran the study at site. We thank the Health Technology Assessment Programme of the UK NIHR for funding the study (no. 11/72/01). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the UK Government Department of Health. A full report of the study30 has been published by the NIHR Library.Peer reviewedPublisher PD
Continuous low-dose antibiotic prophylaxis to prevent urinary tract infection in adults who perform clean intermittent self-catheterisation: the AnTIC RCT
Peer reviewedPublisher PD
Unsupervised Paraphrasing via Deep Reinforcement Learning
Paraphrasing is expressing the meaning of an input sentence in different
wording while maintaining fluency (i.e., grammatical and syntactical
correctness). Most existing work on paraphrasing use supervised models that are
limited to specific domains (e.g., image captions). Such models can neither be
straightforwardly transferred to other domains nor generalize well, and
creating labeled training data for new domains is expensive and laborious. The
need for paraphrasing across different domains and the scarcity of labeled
training data in many such domains call for exploring unsupervised paraphrase
generation methods. We propose Progressive Unsupervised Paraphrasing (PUP): a
novel unsupervised paraphrase generation method based on deep reinforcement
learning (DRL). PUP uses a variational autoencoder (trained using a
non-parallel corpus) to generate a seed paraphrase that warm-starts the DRL
model. Then, PUP progressively tunes the seed paraphrase guided by our novel
reward function which combines semantic adequacy, language fluency, and
expression diversity measures to quantify the quality of the generated
paraphrases in each iteration without needing parallel sentences. Our extensive
experimental evaluation shows that PUP outperforms unsupervised
state-of-the-art paraphrasing techniques in terms of both automatic metrics and
user studies on four real datasets. We also show that PUP outperforms
domain-adapted supervised algorithms on several datasets. Our evaluation also
shows that PUP achieves a great trade-off between semantic similarity and
diversity of expression
Symmetric Weighted First-Order Model Counting
The FO Model Counting problem (FOMC) is the following: given a sentence
in FO and a number , compute the number of models of over a
domain of size ; the Weighted variant (WFOMC) generalizes the problem by
associating a weight to each tuple and defining the weight of a model to be the
product of weights of its tuples. In this paper we study the complexity of the
symmetric WFOMC, where all tuples of a given relation have the same weight. Our
motivation comes from an important application, inference in Knowledge Bases
with soft constraints, like Markov Logic Networks, but the problem is also of
independent theoretical interest. We study both the data complexity, and the
combined complexity of FOMC and WFOMC. For the data complexity we prove the
existence of an FO formula for which FOMC is #P-complete, and the
existence of a Conjunctive Query for which WFOMC is #P-complete. We also
prove that all -acyclic queries have polynomial time data complexity.
For the combined complexity, we prove that, for every fragment FO, , the combined complexity of FOMC (or WFOMC) is #P-complete.Comment: To appear at PODS'1
GIANT: Scalable Creation of a Web-scale Ontology
Understanding what online users may pay attention to is key to content
recommendation and search services. These services will benefit from a highly
structured and web-scale ontology of entities, concepts, events, topics and
categories. While existing knowledge bases and taxonomies embody a large volume
of entities and categories, we argue that they fail to discover properly
grained concepts, events and topics in the language style of online population.
Neither is a logically structured ontology maintained among these notions. In
this paper, we present GIANT, a mechanism to construct a user-centered,
web-scale, structured ontology, containing a large number of natural language
phrases conforming to user attentions at various granularities, mined from a
vast volume of web documents and search click graphs. Various types of edges
are also constructed to maintain a hierarchy in the ontology. We present our
graph-neural-network-based techniques used in GIANT, and evaluate the proposed
methods as compared to a variety of baselines. GIANT has produced the Attention
Ontology, which has been deployed in various Tencent applications involving
over a billion users. Online A/B testing performed on Tencent QQ Browser shows
that Attention Ontology can significantly improve click-through rates in news
recommendation.Comment: Accepted as full paper by SIGMOD 202
Open Question Answering
Thesis (Ph.D.)--University of Washington, 2014For the past fifteen years, search engines like Google have been the dominant way of finding information online. However, search engines break down when presented with complex information needs expressed as natural language questions. Further, as more people access the web from mobile devices with limited input/output capabilities, the need for software that can interpret and answer questions becomes more pressing. This dissertation studies the design of Open Question Answering (Open QA) systems that answer questions by reasoning over large, open-domain knowledge bases. Open QA systems are faced with two challenges. The first challenge is knowledge acquisition: How does the system acquire and represent the knowledge needed to answer questions? I describe a simple and scalable information extraction technique that automatically constructs an open-domain knowledge base from web text. The second challenge that Open QA systems face is question interpretation: How does the system robustly map questions to queries over its knowledge? I describe algorithms that learn to interpret questions by leveraging massive amounts of data from community QA sites like WikiAnswers. This dissertation shows that combining information extraction with community-QA data can enable Open QA at a much larger scale than what was previously possible